Analyzing Seattle Public Library Checkouts: Trends, Insights, and Future Forecasting

The Seattle Public Library (SPL) plays a vital role in providing access to a diverse range of resources, including books, audiobooks, eBooks, music, movies, and more. The library's mission is to empower the community by offering free access to information, fostering literacy, and promoting lifelong learning. Understanding user behavior and checkout patterns is crucial for optimizing library services and ensuring that SPL continues to meet the evolving needs of its patrons.

Objective

This project aims to analyze the checkout data from Seattle Public Library over the past five years (2018-2023). The goal is to uncover trends, gain insights into user preferences, and forecast future checkouts. This analysis will help the library make informed decisions about resource allocation, collection development, and user engagement strategies.

Distribution of Checkouts by Material Type

Insight: EBooks and traditional Books dominate the checkouts, indicating a strong preference for reading materials among library users. Audiobooks also show significant usage, reflecting the growing popularity of audio content. Visual media like movies and TV shows have lower checkouts, likely due to the availability of streaming services.

Monthly Trends in Checkouts

Insight: The time series analysis reveals clear seasonal peaks, especially at the beginning of each year. The impact of the COVID-19 pandemic is evident with a significant drop in checkouts in 2020, followed by a gradual recovery.

Advanced Time Series Analysis - Seasonal Decomposition

Seasonal Decomposition of Monthly Checkouts

Insight: The seasonal decomposition highlights strong seasonal patterns, with notable peaks at regular intervals. The trend component shows a decline during the pandemic, followed by a recovery. The residuals indicate some anomalies, suggesting occasional unexpected variations.

Forecasting Future Checkouts

Insight: The forecast predicts a significant spike in checkouts around early 2024, aligning with the historical seasonal peaks. This information can help the library prepare for increased demand and optimize resource allocation.

Conclusion

Summary of Findings:

Future Work:

By leveraging these insights, the Seattle Public Library can enhance its operational efficiency, user engagement, and overall service delivery, ensuring it meets the evolving needs of its community effectively.

Citations

Below are the resources and references used in this analysis:

Appendix

Jupyter Notebook Code Snippets:

import pandas as pd
import plotly.express as px
import plotly.io as pio
from sodapy import Socrata
from datetime import datetime
import warnings
import matplotlib.pyplot as plt
from statsmodels.tsa.seasonal import seasonal_decompose
from statsmodels.tsa.holtwinters import ExponentialSmoothing

# Suppress future warnings
warnings.simplefilter(action='ignore', category=FutureWarning)

#Load environment variables from .env file
load_dotenv()

# Your credentials
app_token = os.getenv("APP_TOKEN")
api_key = os.getenv("API_KEY")
api_secret = os.getenv("API_SECRET")
username = os.getenv("USERNAME")
password = os.getenv("PASSWORD")

# Data Retrieval
# Authenticated client
client = Socrata("data.seattle.gov", app_token)

# Function to fetch data for a specific year
def fetch_data_for_year(year):
    limit = 1000
    offset = 0
    year_data = []
    results = client.get("tmmm-ytt6", limit=limit, offset=offset, where=f"checkoutyear={year}")
    year_data.extend(results)
    return year_data


data_2018 = fetch_data_for_year(2018)
data_2019 = fetch_data_for_year(2019)
data_2020 = fetch_data_for_year(2020)
data_2021 = fetch_data_for_year(2021)
data_2022 = fetch_data_for_year(2022)
data_2023 = fetch_data_for_year(2023)

all_data = data_2018 + data_2019 + data_2020 + data_2021 + data_2022 + data_2023
df = pd.DataFrame.from_records(all_data)

# Data Cleaning
df = df.dropna(subset=['title', 'checkouts'])
df['checkoutdate'] = pd.to_datetime(df['checkoutyear'].astype str) + '-' + df['checkoutmonth'].astype(str) + '-01')
df['checkouts'] = pd.to_numeric(df['checkouts'])

# Distribution of Checkouts by Material Type
material_counts = df['materialtype'].value_counts()
fig1 = px.bar(material_counts, x=material_counts.index, y=material_counts.values, 
              title="Distribution of Checkouts by Material Type",
              labels={'x': 'Material Type', 'y': 'Number of Checkouts'},
              text=material_counts.values)
fig1.update_layout(title_x=0.5, title_font=dict(size=20), xaxis_title_font=dict(size=15), yaxis_title_font=dict(size=15))

# Save Plotly figure as HTML div
plot1_html = pio.to_html(fig1, full_html=False)

# Monthly Trends in Checkouts
monthly_trends = df.groupby(df['checkoutdate'].dt.to_period('M'))['checkouts'].sum().to_timestamp()
date_range = pd.date_range(start=monthly_trends.index.min(), end=monthly_trends.index.max(), freq='MS')
monthly_trends = monthly_trends.reindex(date_range, fill_value=0)

fig2 = px.line(x=monthly_trends.index, y=monthly_trends.values, title="Monthly Trends in Checkouts",
               labels={'x': 'Date', 'y': 'Number of Checkouts'})
fig2.update_layout(
    title_x=0.5,
    title_font=dict(size=20),
    xaxis_title='Date',
    yaxis_title='Number of Checkouts',
    xaxis_title_font=dict(size=15),
    yaxis_title_font=dict(size=15)
)

# Save Plotly figure as HTML div
plot2_html = pio.to_html(fig2, full_html=False)

# Advanced Time Series Analysis - Seasonal Decomposition
result = seasonal_decompose(monthly_trends, model='additive', period=12)

fig, axes = plt.subplots(4, 1, figsize=(10, 8), sharex=True)
result.observed.plot(ax=axes[0], title='Observed', fontsize=12, color='blue')
result.trend.plot(ax=axes[1], title='Trend', fontsize=12, color='green')
result.seasonal.plot(ax=axes[2], title='Seasonal', fontsize=12, color='orange')
result.resid.plot(ax=axes[3], title='Residual', fontsize=12, color='red')


fig.supylabel('Number of Checkouts', fontsize=14)
plt.suptitle('Seasonal Decomposition of Monthly Checkouts', fontsize=14, x=0.5)
plt.tight_layout(rect=[0, 0.03, 1, 0.95])
plt.savefig('seasonal_decomposition.png')
plt.close()

# Forecasting Future Checkouts
model = ExponentialSmoothing(monthly_trends, seasonal='additive', seasonal_periods=12)
fit = model.fit()
forecast = fit.forecast(12)

fig3 = px.line(x=monthly_trends.index.tolist() + forecast.index.tolist(), y=monthly_trends.tolist() + forecast.tolist(), 
               title="Forecasting Future Checkouts",
               labels={'x': 'Date', 'y': 'Number of Checkouts'})
fig3.add_scatter(x=forecast.index, y=forecast, mode='lines', name='Forecast')
fig3.update_layout(title_x=0.5, title_font=dict(size=20), xaxis_title_font=dict(size=15), yaxis_title_font=dict(size=15))

# Save Plotly figure as HTML div
plot3_html = pio.to_html(fig3, full_html=False)